Deep generative models parametrized up to a normalizing constant (e.g. energy-based models) are difficult to train by maximizing the likelihood of the data because the likelihood and/or gradients thereof cannot be explicitly or efficiently written down. Score matching is a training method, whereby instead of fitting the likelihood $\log p(x)$ for the training data, we instead fit the score function $\nabla_x \log p(x)$ -- obviating the need to evaluate the partition function. Though this estimator is known to be consistent, its unclear whether (and when) its statistical efficiency is comparable to that of maximum likelihood -- which is known to be (asymptotically) optimal. We initiate this line of inquiry in this paper, and show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated -- i.e. the Poincar\'e, log-Sobolev and isoperimetric constant -- quantities which govern the mixing time of Markov processes like Langevin dynamics. Roughly, we show that the score matching estimator is statistically comparable to the maximum likelihood when the distribution has a small isoperimetric constant. Conversely, if the distribution has a large isoperimetric constant -- even for simple families of distributions like exponential families with rich enough sufficient statistics -- score matching will be substantially less efficient than maximum likelihood. We suitably formalize these results both in the finite sample regime, and in the asymptotic regime. Finally, we identify a direct parallel in the discrete setting, where we connect the statistical properties of pseudolikelihood estimation with approximate tensorization of entropy and the Glauber dynamics.
translated by 谷歌翻译
变形AutoEncoders(VAES)是最常用的生成模型之一,特别是对于图像数据。训练VAE中的突出困难是在低维歧管上支持的数据。戴伊和WIPF(2019年)的最新工作表明,在低维数据上,发电机将收敛到具有0方差的解决方案,该方案被正确地支持地面真相歧管。在本文中,通过组合理论和经验结果,我们表明故事更加微妙。正是,我们表明,对于线性编码器/解码器,故事大多是真实的,VAE训练确实恢复了一个等于地面真理歧管的支撑的发电机,但这是由于梯度下降的隐含偏差而不是仅仅是vae损失本身。在非线性案例中,我们表明VAE训练经常学习更高度的歧管,这是地面真相歧管的超集。
translated by 谷歌翻译
域泛化旨在通过来自有限数量的培训环境的数据表现良好。尽管这项任务提出了提案算法,但理论上和经验仍然非常具有挑战性的评估其表现。分类匹配算法,如(条件)域对抗网络[Ganin等,2016,Long等人,2018]是流行的,享受经验的成功,但缺乏正式的保证。其他诸如不变风险最小化(IRM)的方法需要一定大量的大量培训环境 - 在虚假的特征空间的维度中,即使在[Rosenfeld等人是否提出的简单数据模型, 2021]。在该模型的变种下,我们表明,ERM和IRM都不能以$ O(d_s)$环境概括。然后,我们提出了一种迭代特征匹配算法,其保证具有高概率,以产生推广在仅看到$ O(\ log d_s)$环境之后推广的预测器。我们的结果为在具体的非竞争数据模型下,广泛使用的分销匹配算法系列提供了第一理论理由。
translated by 谷歌翻译
对分发外概括的流行假设是训练数据包括子数据集,每个数据集每种分布从不同的分布中汲取;然后,目标是“插入”这些分布和“推断”超越它们 - 这一目标广泛称为域泛化。常见的信念是,ERM可以插入但不推断,后者更困难,但这些索赔含糊不清,缺乏正式的理由。在这项工作中,我们通过在球员之间重新推广子组作为在线游戏,从而最大限度地减少风险和对手呈现新的测试分布。在基于副组可能性的重量和外推的现有概念下,我们严格证明外推比插值更难地更难,尽管它们的统计复杂性没有显着差异。此外,我们表明ERM - 或嘈杂的变体 - 对于两个任务来说都是最佳的最佳状态。我们的框架为域泛化算法的正式分析提供了一个新的途径,这可能具有独立兴趣。
translated by 谷歌翻译
Eco-driving strategies have been shown to provide significant reductions in fuel consumption. This paper outlines an active driver assistance approach that uses a residual policy learning (RPL) agent trained to provide residual actions to default power train controllers while balancing fuel consumption against other driver-accommodation objectives. Using previous experiences, our RPL agent learns improved traction torque and gear shifting residual policies to adapt the operation of the powertrain to variations and uncertainties in the environment. For comparison, we consider a traditional reinforcement learning (RL) agent trained from scratch. Both agents employ the off-policy Maximum A Posteriori Policy Optimization algorithm with an actor-critic architecture. By implementing on a simulated commercial vehicle in various car-following scenarios, we find that the RPL agent quickly learns significantly improved policies compared to a baseline source policy but in some measures not as good as those eventually possible with the RL agent trained from scratch.
translated by 谷歌翻译
With the growing need to reduce energy consumption and greenhouse gas emissions, Eco-driving strategies provide a significant opportunity for additional fuel savings on top of other technological solutions being pursued in the transportation sector. In this paper, a model-free deep reinforcement learning (RL) control agent is proposed for active Eco-driving assistance that trades-off fuel consumption against other driver-accommodation objectives, and learns optimal traction torque and transmission shifting policies from experience. The training scheme for the proposed RL agent uses an off-policy actor-critic architecture that iteratively does policy evaluation with a multi-step return and policy improvement with the maximum posteriori policy optimization algorithm for hybrid action spaces. The proposed Eco-driving RL agent is implemented on a commercial vehicle in car following traffic. It shows superior performance in minimizing fuel consumption compared to a baseline controller that has full knowledge of fuel-efficiency tables.
translated by 谷歌翻译
This paper develops a clustering method that takes advantage of the sturdiness of model-based clustering, while attempting to mitigate some of its pitfalls. First, we note that standard model-based clustering likely leads to the same number of clusters per margin, which seems a rather artificial assumption for a variety of datasets. We tackle this issue by specifying a finite mixture model per margin that allows each margin to have a different number of clusters, and then cluster the multivariate data using a strategy game-inspired algorithm to which we call Reign-and-Conquer. Second, since the proposed clustering approach only specifies a model for the margins -- but leaves the joint unspecified -- it has the advantage of being partially parallelizable; hence, the proposed approach is computationally appealing as well as more tractable for moderate to high dimensions than a `full' (joint) model-based clustering approach. A battery of numerical experiments on artificial data indicate an overall good performance of the proposed methods in a variety of scenarios, and real datasets are used to showcase their application in practice.
translated by 谷歌翻译
Reducing the quantity of annotations required for supervised training is vital when labels are scarce and costly. This reduction is especially important for semantic segmentation tasks involving 3D datasets that are often significantly smaller and more challenging to annotate than their image-based counterparts. Self-supervised pre-training on large unlabelled datasets is one way to reduce the amount of manual annotations needed. Previous work has focused on pre-training with point cloud data exclusively; this approach often requires two or more registered views. In the present work, we combine image and point cloud modalities, by first learning self-supervised image features and then using these features to train a 3D model. By incorporating image data, which is often included in many 3D datasets, our pre-training method only requires a single scan of a scene. We demonstrate that our pre-training approach, despite using single scans, achieves comparable performance to other multi-scan, point cloud-only methods.
translated by 谷歌翻译
本文介绍了基于2022年国际生物识别技术联合会议(IJCB 2022)举行的基于隐私感知合成训练数据(SYN-MAD)的面部变形攻击检测的摘要。该竞赛吸引了来自学术界和行业的12个参与团队,并在11个不同的国家 /地区举行。最后,参与团队提交了七个有效的意见书,并由组织者进行评估。竞争是为了介绍和吸引解决方案的解决方案,这些解决方案涉及检测面部变形攻击的同时,同时出于道德和法律原因保护人们的隐私。为了确保这一点,培训数据仅限于组织者提供的合成数据。提交的解决方案提出了创新,导致在许多实验环境中表现优于所考虑的基线。评估基准现在可在以下网址获得:https://github.com/marcohuber/syn-mad-2022。
translated by 谷歌翻译
具有通用机器人臂的外星漫游者在月球和行星勘探中具有许多潜在的应用。将自主权引入此类系统是需要增加流浪者可以花费收集科学数据并收集样本的时间的。这项工作调查了深钢筋学习对月球上对象的基于视觉的机器人抓握的适用性。创建了一个具有程序生成数据集的新型模拟环境,以在具有不平衡的地形和严酷照明的非结构化场景中训练代理。然后,采用了无模型的非政治演员 - 批评算法来端对端学习,该策略将紧凑的OCTREE观察结果直接映射到笛卡尔空间中的连续行动。实验评估表明,与传统使用的基于图像的观测值相比,3D数据表示可以更有效地学习操纵技能。域随机化改善了以前看不见的物体和不同照明条件的新场景的学识关系的概括。为此,我们通过评估月球障碍设施中的真实机器人上的训练有素的代理来证明零射击的SIM到现实转移。
translated by 谷歌翻译